Excel BI - Excel Challenge 806

excel-challenges
excel-formulas
🔰 Extract Aadhar and PAN numbers from given strings.
Published

March 24, 2026

Illustration for Excel BI - Excel Challenge 806

Challenge Description

🔰 Extract Aadhar and PAN numbers from given strings. In case of multi answer, extract the first occurrence.

Solutions

library(tidyverse)
library(readxl)

path = "Excel/800-899/806/806 Extract Aadhar and PAN.xlsx"
input = read_excel(path, range = "A2:B11")
test  = read_excel(path, range = "C2:D11")

result = input %>%
  mutate(Aadhar = str_extract(Strings, "\\d{12}") %>% as.numeric(),
         PAN = str_extract(Strings, "[A-Z]{5}\\d{4}[A-Z]")) %>%
  select(-c(Strings, Names))

all.equal(result, test, check.attributes = FALSE)
# [1] TRUE
  • Logic: Read the workbook ranges needed for the challenge; Derive the required intermediate columns; Parse the packed text or string structure.
  • Strengths: The solution stays close to the text pattern itself, which makes the extraction logic easy to audit.
  • Areas for Improvement: The solution assumes the workbook layout and selected ranges remain stable, so any structural change in the sheet would require small adjustments.
  • Gem: A small number of well-targeted text patterns does most of the heavy lifting.
import pandas as pd
pd.set_option('display.float_format', '{:.0f}'.format)  # To display floats without scientific notation 

path = "800-899/806/806 Extract Aadhar and PAN.xlsx"
input = pd.read_excel(path, usecols="A:B", skiprows=1, nrows=10)
test = pd.read_excel(path, usecols="C:D", skiprows=1, nrows=10)

result = input.copy()
result['Aadhar'] = result['Strings'].astype(str).str.extract(r"(\d{12})", expand=False).astype('float64')
result['PAN'] = result['Strings'].astype(str).str.extract(r"([A-Z]{5}\d{4}[A-Z])")[0]
result = result.drop(columns=['Strings', 'Names'])

print(pd.concat([result, test], axis=1).to_string(index=False))

#       Aadhar        PAN          Aadhar        PAN
# 652381472095 ABCDE1234F    652381472095 ABCDE1234F
#          NaN PQWRT9087K             NaN PQWRT9087K
# 721468590342        NaN    721468590342        NaN
# 503192071184 KJHGF7821L    503192071184 KJHGF7821L
# 614573829570        NaN    614573829570        NaN
# 487210986345        NaN    487210986345        NaN
# 395076421813 GHJKL7812R    395076421813 GHJKL7812R
#          NaN ASDFG2301W             NaN ASDFG2301W
# 826419534708 QWERT9876P    826419534708 QWERT9876P

The Python version expresses the core extraction rule directly and keeps the pattern matching easy to review.

Difficulty Level

Medium

The individual steps are manageable, but the correct transformation pattern is not obvious from the raw data.